Spearman Rank Correlation Test (Spearman’s ρ)#
Spearman’s rank correlation answers a simple question:
“As X increases, does Y tend to increase or decrease (not necessarily linearly)?”
It is a nonparametric measure of association because it works on ranks instead of raw values.
Common use-cases:
ordinal data (Likert scales, rankings)
monotonic but nonlinear relationships (e.g., exponential, saturating curves)
data with outliers where Pearson’s correlation is overly sensitive
Learning goals#
By the end you can:
explain what Spearman’s ρ measures (and what it does not measure)
compute ρ from first principles using only NumPy
perform a hypothesis test via a permutation test (p-value without distribution tables)
interpret ρ, p-values, and (bootstrap) confidence intervals
visualize monotonic vs non-monotonic relationships with Plotly
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
try:
from scipy.stats import spearmanr as scipy_spearmanr
except Exception:
scipy_spearmanr = None
1) When should you use Spearman?#
Spearman is a good default when you care about monotonic association:
Pearson (r): “How well does a straight line explain the relationship?”
Spearman (ρ): “How well does a monotonic curve explain the relationship?”
Quick intuition:
If you can sort by X and Y tends to move in one direction (up or down), Spearman tends to be large in magnitude.
If Y goes up then down (U-shape / sinusoidal), Spearman can be near 0 even though there is a strong relationship.
A rough decision table:
Situation |
Prefer |
|---|---|
Linear relationship, roughly symmetric noise |
Pearson |
Monotonic but nonlinear relationship |
Spearman |
Ordinal measurements (ranks, Likert scales) |
Spearman |
Lots of ties (discrete/rounded data) |
Spearman works, but power drops; consider Kendall’s τ |
Assumptions (what the test needs)#
For the usual interpretation, you generally want:
paired observations (xᵢ, yᵢ) from the same unit
independence across units (no repeated-measures without modeling that)
a meaningful ordering of values (ordinal or continuous)
No normality assumption is required for Spearman’s ρ itself.
2) What Spearman’s ρ actually is#
Step 1: replace values by ranks.
If
x = [10, 20, 20, 40]
then
rank(x) = [1, 2.5, 2.5, 4] (ties get the average rank)
Step 2: compute Pearson correlation on those ranks:
[ \rho = \mathrm{corr}(\mathrm{rank}(x), \mathrm{rank}(y)) ]
Special case (no ties): you’ll often see
[ \rho = 1 - \frac{6\sum_i d_i^2}{n(n^2 - 1)} ]
where (d_i = \mathrm{rank}(x_i) - \mathrm{rank}(y_i)).
def drop_nan_pairs(x, y):
x = np.asarray(x, dtype=float)
y = np.asarray(y, dtype=float)
if x.shape != y.shape:
raise ValueError("x and y must have the same shape")
mask = np.isfinite(x) & np.isfinite(y)
return x[mask], y[mask]
def rankdata_average(x):
x = np.asarray(x)
if x.ndim != 1:
raise ValueError("rankdata_average expects a 1D array")
n = x.size
order = np.argsort(x, kind="mergesort")
sorted_x = x[order]
ranks = np.empty(n, dtype=float)
i = 0
while i < n:
j = i
while j + 1 < n and sorted_x[j + 1] == sorted_x[i]:
j += 1
avg_rank = 0.5 * (i + j) + 1.0
ranks[order[i : j + 1]] = avg_rank
i = j + 1
return ranks
def pearson_r(x, y):
x, y = drop_nan_pairs(x, y)
if x.size < 2:
raise ValueError("Need at least 2 paired observations")
xc = x - x.mean()
yc = y - y.mean()
denom = np.sqrt(np.sum(xc**2) * np.sum(yc**2))
if denom == 0:
return np.nan
return float(np.sum(xc * yc) / denom)
def spearman_rho(x, y):
x, y = drop_nan_pairs(x, y)
rx = rankdata_average(x)
ry = rankdata_average(y)
return pearson_r(rx, ry)
x_demo = np.array([10, 20, 20, 40])
rankdata_average(x_demo)
array([1. , 2.5, 2.5, 4. ])
# Tie-free identity check: corr(ranks) == 1 - 6*sum(d^2)/(n*(n^2-1))
x = np.array([3, 1, 4, 2, 5])
y = np.array([30, 10, 40, 20, 60])
rx = rankdata_average(x)
ry = rankdata_average(y)
d2 = np.sum((rx - ry) ** 2)
n = x.size
rho_corr = spearman_rho(x, y)
rho_d2 = 1 - (6 * d2) / (n * (n**2 - 1))
rho_corr, rho_d2
(1.0, 1.0)
3) How to interpret ρ#
Sign: direction of monotonic association
ρ > 0: larger X tends to come with larger Y
ρ < 0: larger X tends to come with smaller Y
Magnitude: strength of monotonic ordering agreement
|ρ| = 1: perfect monotonic relationship (ranks are perfectly aligned or perfectly reversed)
ρ ≈ 0: no monotonic association (but there may still be a strong non-monotonic relationship!)
Effect size is context-dependent, but a rough mental scale sometimes used:
|ρ| ~ 0.1 small, ~0.3 medium, ~0.5+ large
Always remember: correlation describes association, not causation.
4) The hypothesis test#
We test whether the monotonic association could plausibly be 0 in the population.
Typical hypotheses:
H₀: ρ = 0 (no monotonic association)
H₁: ρ ≠ 0 (two-sided), or ρ > 0 / ρ < 0 (one-sided)
A p-value answers:
“If there were no monotonic association, how often would we see a Spearman ρ at least this extreme just by chance?”
Why a permutation test?#
If H₀ is true, the pairing between X and Y is arbitrary. So we can:
keep X fixed
randomly permute Y many times
recompute ρ each time
The permutation distribution tells us how extreme our observed ρ is under H₀. This gives an (approximate) p-value without relying on normality assumptions.
def spearman_permutation_test(
x,
y,
n_resamples=5000,
alternative="two-sided",
seed=0,
return_null=False,
):
x, y = drop_nan_pairs(x, y)
n = x.size
if n < 2:
raise ValueError("Need at least 2 paired observations")
rx = rankdata_average(x)
ry = rankdata_average(y)
rx_c = rx - rx.mean()
ry_c = ry - ry.mean()
denom = np.sqrt(np.sum(rx_c**2) * np.sum(ry_c**2))
if denom == 0:
raise ValueError("Spearman correlation is undefined when ranks have zero variance")
rho_obs = float(np.sum(rx_c * ry_c) / denom)
rng_local = np.random.default_rng(seed)
rhos = np.empty(n_resamples, dtype=float)
for i in range(n_resamples):
ry_perm = rng_local.permutation(ry_c)
rhos[i] = np.sum(rx_c * ry_perm) / denom
if alternative == "two-sided":
extreme = np.abs(rhos) >= abs(rho_obs)
elif alternative == "greater":
extreme = rhos >= rho_obs
elif alternative == "less":
extreme = rhos <= rho_obs
else:
raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")
p_value = float((extreme.sum() + 1) / (n_resamples + 1))
if return_null:
return rho_obs, p_value, rhos
return rho_obs, p_value
Confidence intervals (optional but useful)#
A p-value answers “is there evidence of association?” A confidence interval answers “how big might the association be?”
We can approximate a CI for ρ using the bootstrap:
resample paired observations (xᵢ, yᵢ) with replacement
compute ρ for each resample
take percentiles
def spearman_bootstrap_ci(
x,
y,
n_resamples=3000,
ci=0.95,
seed=0,
return_samples=False,
):
x, y = drop_nan_pairs(x, y)
n = x.size
if n < 2:
raise ValueError("Need at least 2 paired observations")
rng_local = np.random.default_rng(seed)
rhos = np.empty(n_resamples, dtype=float)
for i in range(n_resamples):
idx = rng_local.integers(0, n, size=n)
rhos[i] = spearman_rho(x[idx], y[idx])
alpha = (1 - ci) / 2
lo, hi = np.quantile(rhos, [alpha, 1 - alpha])
if return_samples:
return float(lo), float(hi), rhos
return float(lo), float(hi)
5) Visual intuition: monotonic vs non-monotonic#
We’ll compare Pearson vs Spearman on four toy datasets:
linear
monotonic but nonlinear
linear + outliers
non-monotonic (U-shape)
The goal is to build intuition for what changes when you switch from raw values to ranks.
n = 80
x1 = rng.uniform(0, 10, size=n)
y1 = 2 * x1 + rng.normal(0, 2, size=n)
x2 = rng.uniform(0, 5, size=n)
y2 = x2**2 + rng.normal(0, 1, size=n)
x3 = x1.copy()
y3 = y1.copy()
out_idx = rng.choice(n, size=4, replace=False)
y3[out_idx] += rng.normal(0, 30, size=4)
x4 = rng.uniform(0, 10, size=n)
y4 = (x4 - 5) ** 2 + rng.normal(0, 2, size=n)
datasets = [
("Linear", x1, y1),
("Monotonic nonlinear", x2, y2),
("Linear + outliers", x3, y3),
("Non-monotonic (U-shape)", x4, y4),
]
fig = make_subplots(
rows=2,
cols=2,
subplot_titles=[name for name, _, _ in datasets],
)
for i, (name, x, y) in enumerate(datasets):
row = 1 if i < 2 else 2
col = (i + 1) if i < 2 else (i - 1)
fig.add_trace(
go.Scatter(
x=x,
y=y,
mode="markers",
marker=dict(size=7, opacity=0.75),
showlegend=False,
),
row=row,
col=col,
)
pear = pearson_r(x, y)
spear = spearman_rho(x, y)
x_annot = float(np.quantile(x, 0.05))
y_annot = float(np.quantile(y, 0.95))
fig.add_annotation(
row=row,
col=col,
x=x_annot,
y=y_annot,
text=f"Pearson r = {pear:.3f}<br>Spearman ρ = {spear:.3f}",
showarrow=False,
align="left",
bgcolor="rgba(255,255,255,0.7)",
bordercolor="gray",
borderwidth=1,
)
fig.update_xaxes(title_text="x", row=row, col=col)
fig.update_yaxes(title_text="y", row=row, col=col)
fig.update_layout(
height=650,
width=1000,
title_text="Pearson vs Spearman: what changes?",
margin=dict(t=80),
)
fig
The rank view (why Spearman works for monotonic curves)#
Spearman ignores the actual spacing between values and keeps only the order. Plotting rank(x) vs rank(y) turns a monotonic relationship into something close to a straight line.
x = x2
y = y2
rx = rankdata_average(x)
ry = rankdata_average(y)
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=["Raw values", "Ranks (Spearman works here)"]
)
fig.add_trace(
go.Scatter(x=x, y=y, mode="markers", marker=dict(size=7, opacity=0.75), showlegend=False),
row=1,
col=1,
)
fig.add_trace(
go.Scatter(
x=rx,
y=ry,
mode="markers",
marker=dict(size=7, opacity=0.75),
showlegend=False,
),
row=1,
col=2,
)
fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_yaxes(title_text="y", row=1, col=1)
fig.update_xaxes(title_text="rank(x)", row=1, col=2)
fig.update_yaxes(title_text="rank(y)", row=1, col=2)
fig.update_layout(height=420, width=1000, title_text="Same data, different view")
fig
Key property: invariance to monotonic transformations#
Because Spearman uses ranks, any strictly increasing transform leaves the ordering unchanged:
ρ(x, y) = ρ(f(x), y) = ρ(x, g(y)) = ρ(f(x), g(y))
Pearson correlation generally changes under nonlinear transforms because it cares about the spacing between values.
x = x2
y = y2
x_t = x**3
y_pos = y - y.min() + 1
y_t = np.log(y_pos)
pear_orig = pearson_r(x, y)
pear_trans = pearson_r(x_t, y_t)
spear_orig = spearman_rho(x, y)
spear_trans = spearman_rho(x_t, y_t)
print(f"Pearson (original) = {pear_orig:.4f}")
print(f"Pearson (transformed) = {pear_trans:.4f}")
print(f"Spearman (original) = {spear_orig:.4f}")
print(f"Spearman (transformed) = {spear_trans:.4f}")
fig = px.bar(
{
"metric": ["Pearson", "Spearman", "Pearson", "Spearman"],
"version": ["original", "original", "transformed", "transformed"],
"value": [pear_orig, spear_orig, pear_trans, spear_trans],
},
x="metric",
y="value",
color="version",
barmode="group",
title="Spearman is invariant to monotonic transforms",
)
fig.update_yaxes(range=[-1, 1])
fig
Pearson (original) = 0.9582
Pearson (transformed) = 0.8908
Spearman (original) = 0.9550
Spearman (transformed) = 0.9550
6) Running the Spearman hypothesis test end-to-end#
We’ll use the monotonic nonlinear dataset and compute:
observed ρ
a permutation-based p-value (two-sided)
a bootstrap 95% CI
x = x2
y = y2
rho_obs, p_value, rho_null = spearman_permutation_test(
x,
y,
n_resamples=5000,
alternative="two-sided",
seed=1,
return_null=True,
)
ci_lo, ci_hi, rho_boot = spearman_bootstrap_ci(
x,
y,
n_resamples=3000,
ci=0.95,
seed=2,
return_samples=True,
)
print(
f"n={x.size} | Spearman ρ={rho_obs:.3f} | permutation p={p_value:.4f} | "
f"bootstrap 95% CI=({ci_lo:.3f}, {ci_hi:.3f})"
)
if scipy_spearmanr is not None:
rho_scipy, p_scipy = scipy_spearmanr(x, y)
print(f"SciPy spearmanr: ρ={rho_scipy:.3f} | p={p_scipy:.4f}")
n=80 | Spearman ρ=0.955 | permutation p=0.0002 | bootstrap 95% CI=(0.902, 0.980)
SciPy spearmanr: ρ=0.955 | p=0.0000
Visualizing the permutation null distribution#
Under H₀, ρ should bounce around 0. The p-value is the fraction of permuted correlations at least as extreme as the observed one.
fig = px.histogram(
x=rho_null,
nbins=50,
title="Permutation null distribution of Spearman ρ (H₀: no association)",
labels={"x": "ρ (permuted)"},
)
fig.add_vline(
x=rho_obs,
line_color="crimson",
line_width=3,
annotation_text=f"observed ρ={rho_obs:.3f}",
annotation_position="top right",
)
fig
Visualizing uncertainty with the bootstrap#
Bootstrap samples give an intuitive picture of how much ρ would vary if we repeated the study.
fig = px.histogram(
x=rho_boot,
nbins=50,
title="Bootstrap distribution of Spearman ρ",
labels={"x": "ρ (bootstrap)"},
)
fig.add_vline(x=ci_lo, line_color="gray", line_dash="dash", annotation_text=f"{ci_lo:.3f}")
fig.add_vline(x=ci_hi, line_color="gray", line_dash="dash", annotation_text=f"{ci_hi:.3f}")
fig.add_vline(
x=rho_obs,
line_color="crimson",
line_width=3,
annotation_text=f"observed {rho_obs:.3f}",
annotation_position="top right",
)
fig
7) Interpreting the result (what it means)#
Suppose you got:
ρ = 0.62
p = 0.003 (two-sided)
95% CI: (0.35, 0.80)
Interpretation:
Direction: positive monotonic association (higher X tends to come with higher Y)
Strength: moderately strong ordering agreement
Evidence: under H₀ of no monotonic association, such an extreme ρ would be rare (p small)
Uncertainty: plausible population ρ values are roughly in the CI range
Important caveats:
correlation ≠ causation
ρ near 0 does not rule out non-monotonic relationships
with very large n, tiny |ρ| can still produce tiny p-values (look at effect size + CI)
8) Pitfalls & diagnostics#
Always plot the data (Spearman can miss U-shapes).
Many ties (discrete/rounded data) reduce power; Kendall’s τ is another option.
Outliers affect Spearman less than Pearson, but can still distort the story if they change the ordering.
Non-independence (time series, repeated measures) invalidates the usual p-value; you need a model that respects dependence.
Multiple testing: if you compute many correlations, adjust p-values or control FDR.
9) Exercises#
Implement another tie method (e.g.,
"min"ranking) and compare results.Create a dataset with a clear U-shape and confirm ρ ≈ 0 even though the relationship is strong.
Increase n and reduce the noise; watch how p-values shrink even when ρ changes only a little.
References#
Spearman, C. (1904). The proof and measurement of association between two things.
SciPy:
scipy.stats.spearmanr